AITopics | stochastic shared embedding

Neural Information Processing Systems http://nips.cc/

neural network, probability, sse-se, (10 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Yolo County > Davis (0.14)
North America > Canada (0.04)
Europe > Belgium (0.04)

Industry:

Media > Film (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers

Neural Information Processing SystemsDec-25-2025, 05:51:26 GMT

In deep neural nets, lower level embedding layers account for a large portion of the total number of parameters. Tikhonov regularization, graph-based regularization, and hard parameter sharing are approaches that introduce explicit biases into training in a hope to reduce statistical complexity. Alternatively, we propose stochastically shared embeddings (SSE), a data-driven approach to regularizing embedding layers, which stochastically transitions between embeddings during stochastic gradient descent (SGD). Because SSE integrates seamlessly with existing SGD algorithms, it can be used with only minor modifications when training large scale neural networks. We develop two versions of SSE: SSE-Graph using knowledge graphs of embeddings; SSE-SE using no prior information. We provide theoretical guarantees for our method and show its empirical effectiveness on 6 distinct tasks, from simple neural networks with one hidden layer in recommender systems, to the transformer and BERT in natural languages. We find that when used along with widely-used regularization methods such as weight decay and dropout, our proposed SSE can further reduce overfitting, which often leads to more favorable generalization results.

data-driven regularization, name change, stochastic shared embedding, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.60)

Add feedback

Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers

Liwei Wu, Shuqing Li, Cho-Jui Hsieh, James L. Sharpnack

Neural Information Processing SystemsOct-2-2025, 13:01:51 GMT

In deep neural nets, lower level embedding layers account for a large portion of the total number of parameters.

artificial intelligence, machine learning, natural language, (14 more...)

Neural Information Processing Systems

Country: North America > United States > California > Yolo County > Davis (0.14)

Industry:

Media > Film (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers

Neural Information Processing SystemsMay-31-2025, 13:33:12 GMT

In deep neural nets, lower level embedding layers account for a large portion of the total number of parameters. Tikhonov regularization, graph-based regularization, and hard parameter sharing are approaches that introduce explicit biases into training in a hope to reduce statistical complexity. Alternatively, we propose stochastically shared embeddings (SSE), a data-driven approach to regularizing embedding layers, which stochastically transitions between embeddings during stochastic gradient descent (SGD). Because SSE integrates seamlessly with existing SGD algorithms, it can be used with only minor modifications when training large scale neural networks. We develop two versions of SSE: SSE-Graph using knowledge graphs of embeddings; SSE-SE using no prior information.

artificial intelligence, machine learning, stochastic shared embedding, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.64)

Add feedback

Reviews: Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers

Neural Information Processing SystemsJan-22-2025, 21:58:26 GMT

The paper presents a novel and interesting regularization method, theoretical analysis and good results, yet I fear its main contributions might be limited to recommendation systems or other fields where knowledge graphs are available, easily constructed, or in their absence, intuitively reasonable to assume a complete graph. Outside those types of tasks, I find it presenting arguments which intuitively were not too compelling, as to why other fields or tasks would significantly benefit from such a method, despite showing improved results on some NLP tasks. The simpler version of the regularizer, which in the absence of a knowledge graph assumes a complete graph, permutes embedding indices with a constant*U(1,N) probability. Despite its appealing theoretical properties, it also poses a risk of introducing a bias of its own. The results on NLP tasks didn't show major improvements and lacked in explanation as to why this type of regularizer would be beneficial and effective for different NLP tasks.

data-driven regularization, knowledge graph, stochastic shared embedding, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.92)

Add feedback

Reviews: Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers

Neural Information Processing SystemsJan-22-2025, 21:58:15 GMT

The paper proposes to integrate a stochastic relabelling embedding operator within the training of a neural net. The reviewers and the area chair are convinced of the merits of the approach which comes with a theoretical justification (smoothing the Rademacher complexity in the uniform case) and solid comparative empirical evidence. The visualization of the embeddings and their interpretation (in supplementary material and in the rebuttal) are appreciated. The AC hopes that the authors will take into account the suggestions/questions in the reviews, specifically concerning the scope of the approach and its limitations, when writing the camera-ready version of the paper. Another question which comes to mind is whether the knowledge graph (e.g. as learned from a teacher network) can facilitate the training of a student network, e.g.

data-driven regularization, embedding layer, stochastic shared embedding, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.68)

Add feedback

Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers

Neural Information Processing SystemsOct-9-2024, 20:10:54 GMT

In deep neural nets, lower level embedding layers account for a large portion of the total number of parameters. Tikhonov regularization, graph-based regularization, and hard parameter sharing are approaches that introduce explicit biases into training in a hope to reduce statistical complexity. Alternatively, we propose stochastically shared embeddings (SSE), a data-driven approach to regularizing embedding layers, which stochastically transitions between embeddings during stochastic gradient descent (SGD). Because SSE integrates seamlessly with existing SGD algorithms, it can be used with only minor modifications when training large scale neural networks. We develop two versions of SSE: SSE-Graph using knowledge graphs of embeddings; SSE-SE using no prior information.

data-driven regularization, embedding layer, stochastic shared embedding, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.64)

Add feedback

Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers

Wu, Liwei, Li, Shuqing, Hsieh, Cho-Jui, Sharpnack, James L.

Neural Information Processing SystemsMar-18-2020, 20:18:50 GMT

In deep neural nets, lower level embedding layers account for a large portion of the total number of parameters. Tikhonov regularization, graph-based regularization, and hard parameter sharing are approaches that introduce explicit biases into training in a hope to reduce statistical complexity. Alternatively, we propose stochastically shared embeddings (SSE), a data-driven approach to regularizing embedding layers, which stochastically transitions between embeddings during stochastic gradient descent (SGD). Because SSE integrates seamlessly with existing SGD algorithms, it can be used with only minor modifications when training large scale neural networks. We develop two versions of SSE: SSE-Graph using knowledge graphs of embeddings; SSE-SE using no prior information.

data-driven regularization, embedding layer, stochastic shared embedding, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.64)

Add feedback

Filters

Collaborating Authors

stochastic shared embedding

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers

Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers

Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers

Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers

Reviews: Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers

Reviews: Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers

Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers

Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers